In geographically-distributed systems, communication latencies arenon-negligible. The perceived processing time of a request is thus composed ofthe time needed to route the request to the server and the true processingtime. Once a request reaches a target server, the processing time depends onthe total load of that server; this dependency is described by a load function.We consider a broad class of load functions; we just require that they areconvex and two times differentiable. In particular our model can be applied toheterogeneous systems in which every server has a different load function. Thisapproach allows us not only to generalize results for queuing theory and forbatches of requests, but also to use empirically-derived load functions,measured in a system under stress-testing. The optimal assignment of requeststo servers is communication-balanced, i.e. for any pair of nonperfectly-balanced servers, the reduction of processing time resulting frommoving a single request from the overloaded to underloaded server is smallerthan the additional communication latency. We present a centralized and adecentralized algorithm for optimal load balancing. We prove bounds on thealgorithms' convergence. To the best of our knowledge these bounds were notknown even for the special cases studied previously (queuing theory and batchesof requests). Both algorithms are any-time algorithms. In the decentralizedalgorithm, each server balances the load with a randomly chosen peer. Suchalgorithm is very robust to failures. We prove that the decentralized algorithmperforms locally-optimal steps. Our work extends the currently known results byconsidering a broad class of load functions and by establishing theoreticalbounds on the algorithms' convergence. These results are applicable for serverswhose characteristics under load cannot be described by a standard mathematicalmodels.
展开▼